In the past five years, gun violence has remained a major concern for New Yorkers. Surveys consistently show that the vast majority of residents consider crime to be a very or somewhat serious issue, with many expressing skepticism about progress in reducing gun violence. At the same time, others note that overall crime rates have been trending downward, with some arguing that New York City feels safer than in previous years.
Amid this debate, public agencies continue to prioritize transparency, accountability, and data-driven decision-making. One area where this approach is especially valuable is in crime reporting and prevention.
This report analyzes NYPD shooting incident data to uncover trends, patterns, and factors that shape the distribution of gun violence. Specifically, the analysis focuses on data from 2020 through 2024, with the objectives of identifying geographic hotspots, temporal patterns, and demographic trends.
The dataset used in this analysis is publicly available through NYC Open Data, with further details provided in the Data section. The report will briefly describe the structure and types of data included, but it will not dive into technical details of data cleaning or coding. Instead, the focus is on presenting clear insights through exploratory data analysis (EDA), visualization, and statistical summaries.
The following sections detail the dataset, analytical approach, results, and key recommendations.
About the Data
This analysis uses the NYPD Shooting Incident Data – Historic, which is publicly available through NYC Open Data. The dataset contains detailed records of shooting incidents reported in New York City. For this report, the focus is on the years 2020 through 2024.
At the time of import, the dataset included 29,744 records (rows) and 21 columns (variables). After preparing and cleaning the data (e.g., converting dates into usable formats, handling missing values, and standardizing variable types), the final dataset contained 29,738 records. The cleaning process was carried out in code and is not covered in this report, since the intended audience is non-technical.
Data Structure
Each row in the dataset represents a single shooting incident. The key information captured includes:
Incident details
INCIDENT_KEY: Unique identifier for each incident
OCCUR_DATE and OCCUR_TIME: Date and time of the incident
BORO and PRECINCT: Location by borough and police precinct
JURISDICTION_CODE: Indicates whether the incident fell under NYPD (Patrol, Transit, Housing) or non-NYPD jurisdiction
Participants
PERP_AGE_GROUP, PERP_SEX, PERP_RACE: Age group, sex, and race of the perpetrator
VIC_AGE_GROUP, VIC_SEX, VIC_RACE: Age group, sex, and race of the victim
Incident outcomes
STATISTICAL_MURDER_FLAG: Indicates if the shooting resulted in a death
Location details
LOCATION_DESC: Description of where the incident occurred
X_COORD_CD, Y_COORD_CD, Latitude, Longitude, Lon_Lat: Geographic coordinates of the incident
Notes on Data Use
The dataset includes both categorical information (e.g., borough, sex, race, age group) and numeric/geographic information (e.g., coordinates, dates, times).
While the raw dataset contained some missing or inconsistent values, these were addressed during data preparation. The details of this process are available in the underlying code, but are not presented in this report.
The goal of this section is to provide an overview of what the data contains, rather than the technical steps of cleaning.
Table 1: Sample of NYPD Shooting Incident Data
INCIDENT_KEY
OCCUR_DATE
OCCUR_TIME
BORO
LOC_OF_OCCUR_DESC
PRECINCT
JURISDICTION_CODE
LOC_CLASSFCTN_DESC
LOCATION_DESC
STATISTICAL_MURDER_FLAG
PERP_AGE_GROUP
PERP_SEX
PERP_RACE
VIC_AGE_GROUP
VIC_SEX
VIC_RACE
X_COORD_CD
Y_COORD_CD
Latitude
Longitude
Lon_Lat
YEAR
HOUR
231974218
2021-08-09 00:00:00
01:06:00
Bronx
Unknown
40
NYPD Patrol
Unknown
Unknown
False
Unknown
Unknown
Unknown
18-24
Male
Black
1006343.000000
234270.000000
40.809673
-73.920193
POINT (-73.92019278899994 40.80967347200004)
2021
1
177934247
2018-04-07 00:00:00
19:48:00
Brooklyn
Unknown
79
NYPD Patrol
Unknown
Unknown
True
25-44
Male
White Hispanic
25-44
Male
Black
1000082.937500
189064.671875
40.685610
-73.942913
POINT (-73.94291302299996 40.685609672000055)
2018
19
255028563
2022-12-02 00:00:00
22:57:00
Bronx
Outside
47
NYPD Patrol
Street
Grocery/Bodega
False
Unknown
Unknown
Unknown
25-44
Male
Black
1020691.000000
257125.000000
40.872349
-73.868233
POINT (-73.868233 40.872349)
2022
22
25384540
2006-11-19 00:00:00
01:50:00
Brooklyn
Unknown
66
NYPD Patrol
Unknown
Pvt House
True
Unknown
Unknown
Unknown
18-24
Male
Black
985107.312500
173349.796875
40.642490
-73.996912
POINT (-73.99691224999998 40.642489932000046)
2006
1
72616285
2010-05-09 00:00:00
01:58:00
Bronx
Unknown
46
NYPD Patrol
Unknown
Multi Dwell - Apt Build
True
25-44
Male
Black
<18
Female
Black
1009853.500000
247502.562500
40.845984
-73.907461
POINT (-73.90746098599993 40.84598358900007)
2010
1
85875439
2012-07-22 00:00:00
21:35:00
Bronx
Unknown
42
NYPD Housing
Unknown
Multi Dwell - Public Hous
False
18-24
Male
Black
18-24
Male
Black
1011046.687500
239814.234375
40.824878
-73.903179
POINT (-73.90317908399999 40.82487781900005)
2012
21
79780323
2011-07-12 00:00:00
22:26:00
Brooklyn
Unknown
71
NYPD Patrol
Unknown
Unknown
True
Unknown
Unknown
Unknown
25-44
Male
Black
995125.687500
178185.546875
40.655756
-73.960805
POINT (-73.96080480099994 40.65575638400003)
2011
22
85744504
2012-07-14 00:00:00
23:45:00
Brooklyn
Unknown
69
NYPD Housing
Unknown
Multi Dwell - Public Hous
False
Unknown
Unknown
Unknown
25-44
Male
White Hispanic
1013655.250000
177160.609375
40.652901
-73.894028
POINT (-73.89402764099998 40.652901014000065)
2012
23
142324890
2015-04-21 00:00:00
15:36:00
Brooklyn
Unknown
75
NYPD Patrol
Unknown
Multi Dwell - Apt Build
False
25-44
Male
Black
25-44
Male
Black
1012960.062500
182221.609375
40.666795
-73.896511
POINT (-73.89651148199994 40.66679461900003)
2015
15
152868707
2016-05-07 00:00:00
15:23:00
Brooklyn
Unknown
69
NYPD Patrol
Unknown
Unknown
False
18-24
Male
Black
18-24
Male
Black
1011467.000000
172461.000000
40.640009
-73.901933
POINT (-73.90193284499998 40.64000860200008)
2016
15
Overall Trends
Figure 1:
This chart shows the total number of shooting incidents reported in New York City from 2020 through 2024.
The highest count occurred in 2021 with 2,011 incidents.
2020 followed closely with 1,948 incidents.
Shootings have steadily declined since then, reaching 1,181 incidents in 2024, the lowest in the period.
Takeaway: Shootings surged during the early pandemic years but have steadily declined since 2021, showing measurable progress in reducing gun violence.
Figure 2:
This chart shows how shootings changed year to year compared with the previous year.
From 2020 to 2021, shootings increased by +3.2%.
Incidents then fell sharply: –14.7% in 2022 and –27.2% in 2023, the steepest decline in the five-year period.
In 2024, shootings continued to decrease, though at a slower pace (–5.4%).
Takeaway: While shootings rose slightly in 2021, the following years saw consistent and significant declines, though the pace of improvement slowed in 2024.
Figure 3:
This line chart compares all shooting incidents with those that resulted in a victim’s death.
Total shootings peaked in 2021 at just over 2,000 incidents, before declining to 1,181 in 2024.
Fatal shootings followed a similar pattern: a small rise in 2021, then steady decreases each year, reaching 239 deaths in 2024.
Takeaway: Both overall shootings and fatal shootings have declined since 2021, showing not only fewer incidents but also reduced deadliness over time.
Geographic Analysis
Figure 1:
This chart breaks down total shootings by borough from 2020 through 2024.
The Bronx showed a small increase in 2024, rising to 458 incidents, while Manhattan also ticked upward to 215 incidents.
In contrast, Brooklyn (338 incidents) and Queens (155 incidents) followed steady downward trends.
Staten Island remained the lowest and relatively stable, with a small decline in 2024.
Takeaway: Most boroughs are seeing long-term declines, but localized increases in the Bronx and Manhattan suggest areas where targeted interventions are most needed.
Figure 2:
Table 2: Top 10 Precincts by Shooting Incidents — 2024
Police Precinct
Borough
Number of Incidents
% of Citywide Shootings
Cumulative % of Shootings
44
Bronx
83
7.0%
7.0%
46
Bronx
72
6.1%
13.1%
73
Brooklyn
61
5.2%
18.3%
40
Bronx
55
4.7%
23.0%
75
Brooklyn
52
4.4%
27.4%
42
Bronx
46
3.9%
31.3%
47
Bronx
42
3.6%
34.9%
52
Bronx
41
3.5%
38.4%
48
Bronx
38
3.2%
41.6%
43
Bronx
35
3.0%
44.6%
This table lists the 10 police precincts with the highest number of shooting incidents in 2024.
The 44th Precinct (Bronx) recorded the most, with 83 shootings (7.0% of citywide total).
The 46th Precinct (Bronx) ranked second, with 72 incidents (6.1%).
Together, just these two precincts accounted for 13.1% of all shootings in 2024.
Across the list, 7 of the 10 precincts are in the Bronx, underscoring its concentration of incidents.
Collectively, the top 10 precincts made up nearly 45% of citywide shootings in 2024.
Takeaway: Gun violence is highly concentrated, with the Bronx dominating the top precincts and nearly half of all 2024 incidents coming from only 10 areas.
Figure 3:
Make this Notebook Trusted to load map: File -> Trust Notebook
About this Map (How to Read & Use It)
This interactive map shows locations of NYPD shooting incidents for each year from 2020 to 2024. For each year, the map highlights the Top 10 police precincts with the most incidents that year.
What the colors mean - Each color represents a different precinct (within the year you’ve selected). - Points show where incidents occurred within those precincts.
How to use the map 1. Switch years: Use the layers box in the top-right to toggle a year on or off
(e.g., “2024 • Top 10 precincts”). Only one or multiple years can be shown at once. 2. Zoom & pan: Use the + / − buttons or your mouse/touchpad to zoom in and drag to move around the city. 3. See details:
- Hover over a point to see a quick summary (precinct, borough, date, and whether it resulted in death).
- Click a point to open a small popup with the same information. 4. Clusters: When zoomed out, nearby points are grouped together for readability.
- Zoom in to separate them and see individual incidents.
- At close zoom levels, points will “uncluster” automatically.
What you can learn at a glance - Where shootings concentrate within the Top 10 precincts in a given year.
- How patterns shift across years by toggling different year layers.
- Context for potential hot spots (e.g., clusters of points within specific neighborhoods/precincts).
Notes - This map focuses on the Top 10 precincts per year (not all precincts) so you can quickly see where incidents are most concentrated. - Data comes from NYPD Shooting Incident Data – Historic (NYC Open Data).
Temporal Analysis
Figure 1:
This chart shows monthly shooting incidents for each year between 2020 and 2024.
2020–2022: Shootings peaked in the summer months (June–August), often exceeding 200 incidents.
2023–2024: Summer spikes became smaller, staying below 200 incidents, and monthly trends appeared more stable.
In 2024, incidents stayed fairly steady throughout the year, with only a dip in August and a rebound in September.
Takeaway: Shootings were strongly seasonal in earlier years, but in recent years the summer spikes have weakened, suggesting an overall stabilization of monthly patterns.
Figure 2:
This chart shows how shootings are distributed by time of day for each year between 2020 and 2024.
The pattern across all years forms a U-shape, with peaks during late-night and evening hours.
Early mornings consistently recorded the fewest incidents.
In 2020–2022, late-night and evening peaks often exceeded 100 incidents.
In 2023–2024, these peaks dropped below 100, except for a late-night spike in 2023.
Takeaway: Shootings remain concentrated at night, but overall frequency at peak hours has decreased since 2023.
Demographic Analysis
Figure 1:
This chart compares the age groups of perpetrators and victims in shootings from 2020 through 2024.
Perpetrators: The largest category is consistently “Unknown”, followed by ages 25–44.
Victims: Most are aged 25–44, followed by 18–24.
Victims recorded as “Unknown” are extremely rare, never more than a handful of cases in any year.
Takeaway: Young to middle-aged adults (18–44) are the most impacted as victims, while a large portion of perpetrator ages remain unrecorded.
Figure 2:
This chart compares the sex of perpetrators and victims in shootings between 2020 and 2024.
Perpetrators: In 2020–2021, most were recorded as Unknown, but by 2022–2024, males accounted for nearly 60%, overtaking the Unknown category. Females consistently stayed under 2%.
Victims: The vast majority are male (over 80%), with females representing 10–12% across all years. Unknown victims are negligible, appearing only once in 2023.
Takeaway: Victims are overwhelmingly male, while perpetrator records improved after 2021, with more identified as male rather than left as unknown.
Figure 3:
This chart compares perpetrators and victims in shootings by race/ethnicity from 2020 to 2024.
Perpetrators: The largest categories shift between Unknown and Black, depending on the year. In 2022, Black perpetrators overtook Unknown. White Hispanic and Black Hispanic follow, while White and Asian/Pacific Islander are small minorities.
Victims: Black individuals dominate, consistently making up 65–72% of victims. White Hispanic victims are the second-largest group, followed by Black Hispanic and White victims. Asian/Pacific Islander and American Indian/Alaskan Native victims appear in very small proportions.
Takeaway: Victim demographics are consistent, with Black individuals disproportionately affected, while perpetrator data is less stable due to many cases recorded as “Unknown.”
Murder Flag Analysis:
Figure 1:
This chart shows the percentage of shootings that resulted in a victim’s death between 2020 and 2024.
Fatal shootings made up roughly 19–21% of all incidents each year.
The highest level occurred in 2021 (21.3%), while the lowest was in 2020 (18.8%).
Takeaway: The percentage of shootings that end in death has remained relatively stable, even as overall incident counts have declined.
Conclusion
This analysis of NYPD shooting incident data from 2020 to 2024 shows that gun violence in New York City has declined steadily since its peak in 2021. While the city recorded over 2,000 shootings in 2021, incidents dropped to just over 1,100 by 2024, reflecting meaningful progress.
The data also highlights that gun violence is not evenly distributed. Shootings are concentrated in a small number of precincts, particularly in the Bronx and parts of Manhattan, while other boroughs such as Brooklyn and Queens have shown consistent declines. Temporal patterns indicate that shootings remain most common during nighttime and summer months, though these peaks have weakened in recent years.
Demographic trends reveal that young to middle-aged Black males are disproportionately impacted as victims, while gaps in perpetrator data (notably large “Unknown” categories) limit deeper analysis. The proportion of shootings that result in death has remained relatively stable at around 19–21%, suggesting that while the number of incidents has fallen, their lethality has not significantly changed.
Recommendations
Target geographic hotspots: Focus prevention and intervention strategies in the Bronx and Manhattan precincts that consistently rank highest for shooting incidents.
Address peak times: Maintain heightened prevention efforts during nighttime hours and summer months, when shootings are most frequent.
Improve data quality: Strengthen NYPD data collection on perpetrators, particularly age and sex, to reduce the reliance on “Unknown” categories and improve future analysis.
Sustain monitoring: Continue tracking trends annually to ensure recent declines are maintained and to quickly identify any emerging patterns.
Community-focused strategies: Pair enforcement with community programs addressing root causes of violence in the hardest-hit neighborhoods.
Source Code
---title: "NYPD Shooting History Data Analysis"author: "Christopher Legarda"date: todayjupyter: python3# Page + themeformat: html: theme: cosmo toc: true toc-location: left toc-depth: 3 number-sections: false code-fold: true code-summary: "Show code" code-tools: true df-print: paged smooth-scroll: true anchor-sections: true fig-width: 8 fig-height: 5 fig-align: center tbl-cap-location: top fig-cap-location: bottom # Uncomment if/when you want PDF or Word: # pdf: # documentclass: scrreprt # toc: true # number-sections: true # docx: # toc: true# Execution controlsexecute: echo: false include: false warning: false message: false cache: true freeze: auto # re-run only when code changes# Nice title bannertitle-block-banner: truepage-layout: full# (Optional) params for quick filtering, etc.# params:# year_min: 2020# year_max: 2024---<!-- ## **Data Preparation** -->```{python}import pandas as pdimport webbrowserimport tempfileimport osimport matplotlib as mplimport matplotlib.pyplot as pltimport geopandas as gpdimport foliumfrom folium.plugins import MarkerClusterfrom folium.plugins import HeatMapfrom IPython.display import display, HTML``````{python}file_path =r"C:\Users\Christopher\Documents\Python Projects\NYPD_Shooting_incident\NYPD_Shooting_Incident_Data__Historic_.csv"df = pd.read_csv(file_path)``````{python}df.head()``````{python}# Check basic info about the datasetdf.info()``````{python}# Check basic statistics (only for numeric columns)df.describe()``````{python}# Check ofr missing valuesdf.isnull().sum()```<!-- ### Cleaning the dataset: -->```{python}# Handling duplicatesdf.duplicated().sum()``````{python}df.columns``````{python}# Fixing data typesdf['OCCUR_DATE'] = pd.to_datetime(df['OCCUR_DATE'], errors='coerce')df['OCCUR_TIME'] = pd.to_datetime(df['OCCUR_TIME'], format='%H:%M:%S', errors='coerce').dt.timedf['JURISDICTION_CODE'] = pd.to_numeric(df['JURISDICTION_CODE'], errors='coerce').astype('Int64')df['X_COORD_CD'] = df['X_COORD_CD'].astype(str).str.replace(',', '', regex=False)df['X_COORD_CD'] = pd.to_numeric(df['X_COORD_CD'], errors='coerce')df['Y_COORD_CD'] = df['Y_COORD_CD'].astype(str).str.replace(',', '', regex=False)df['Y_COORD_CD'] = pd.to_numeric(df['Y_COORD_CD'], errors='coerce')``````{python}df.isnull().sum()``````{python}df.info()``````{python}categorical_cols = ['BORO', 'LOC_OF_OCCUR_DESC', 'LOC_CLASSFCTN_DESC', 'LOCATION_DESC', 'PERP_AGE_GROUP', 'PERP_SEX', 'PERP_RACE', 'VIC_AGE_GROUP','VIC_SEX', 'VIC_RACE']for col in categorical_cols: df[col] = df[col].fillna('Unknown')``````{python}df = df.replace('(Null)', 'Unknown')``````{python}for col in categorical_cols: df[col] = df[col].str.strip().str.title()``````{python}df['PERP_AGE_GROUP'].nunique()``````{python}df['PERP_AGE_GROUP'].unique()``````{python}df['PERP_AGE_GROUP'].value_counts()``````{python}# Dropping invalid age groupinvalid_age_group = ['1028', '1020', '940', '224', '2021', '1022']df = df[~df['PERP_AGE_GROUP'].isin(invalid_age_group)]df = df[~df['VIC_AGE_GROUP'].isin(invalid_age_group)]``````{python}df['PERP_AGE_GROUP'].value_counts()``````{python}df['PERP_AGE_GROUP'] = df['PERP_AGE_GROUP'].replace('(Null)', 'Unknown')``````{python}df['VIC_AGE_GROUP'].value_counts()``````{python}df['PERP_SEX'].value_counts()``````{python}df['VIC_SEX'].value_counts()``````{python}# Map short codes to full namesperp_sex_mapping = {'M': 'Male','F': 'Female','U': 'Unknown','Unknown': 'Unknown','(Null)': 'Unknown'}vic_sex_mapping = {'M': 'Male','F': 'Female','U': 'Unknown'}df = df.copy()df['PERP_SEX'] = df['PERP_SEX'].map(perp_sex_mapping)df['VIC_SEX'] = df['VIC_SEX'].map(vic_sex_mapping)``````{python}df['PERP_SEX'].value_counts()``````{python}df['VIC_SEX'].value_counts()``````{python}df['PERP_RACE'].value_counts()``````{python}df['VIC_RACE'].value_counts()``````{python}prep_race_mapping = {'Black': 'Black','Unknown': 'Unknown','White Hispanic': 'White Hispanic','(Null)': 'Unknown','Black Hispanic': 'Black Hispanic','White': 'White','Asian/Pacific Islander': 'Asian/Pacific Islander','American Indian/Alaskan Native': 'American Indian/Alaskan Native' }vic_race_mapping = {'Black': 'Black','White Hispanic': 'White Hispanic','Black Hispanic': 'Black Hispanic','White': 'White','Asian / Pacific Islander': 'Asian/Pacific Islander','Unknown':'Unknown','American Indian/Alaskan Native': 'American Indian/Alaskan Native'}df = df.copy()df['PERP_RACE'] = df['PERP_RACE'].map(prep_race_mapping)df['VIC_RACE'] = df['VIC_RACE'].map(vic_race_mapping)``````{python}df['PERP_RACE'].value_counts()``````{python}df['VIC_RACE'].value_counts()``````{python}type(df)``````{python}df['LOC_OF_OCCUR_DESC'].unique()``````{python}df['JURISDICTION_CODE'].unique()``````{python}df['JURISDICTION_CODE'].value_counts()``````{python}df[df['JURISDICTION_CODE'].isna()]``````{python}df['JURISDICTION_CODE'] = df['JURISDICTION_CODE'].fillna(-1)jurisdiction_mapping = {0: 'NYPD Patrol',1: 'NYPD Transit',2: 'NYPD Housing',-1: 'Unknown'}df['JURISDICTION_CODE'] = df['JURISDICTION_CODE'].map(jurisdiction_mapping)``````{python}df['JURISDICTION_CODE'].value_counts()``````{python}df['LOC_CLASSFCTN_DESC'].value_counts()``````{python}df['LOC_CLASSFCTN_DESC'] = df['LOC_CLASSFCTN_DESC'].replace('(Null)', 'Unknown')df['LOC_CLASSFCTN_DESC'].value_counts()``````{python}df['LOCATION_DESC'].value_counts()``````{python}df['LOCATION_DESC'] = df['LOCATION_DESC'].replace('(Null)', 'Unknown')df['LOCATION_DESC'].value_counts()``````{python}df.isnull().sum()``````{python}df['PERP_RACE'].unique()``````{python}df['PERP_RACE'] = df['PERP_RACE'].fillna('Unknown')df['PERP_RACE'].unique()``````{python}df.isnull().sum()``````{python}df.info()```<!-- ## **Exploratory Data Analysis (EDA)**## **Summary Statistics** -->```{python}import numpy as np``````{python}pd.set_option("display.max_rows", 100)pd.set_option("display.max_columns", 100)pd.set_option("display.float_format", lambda x: f"{x:,.4f}")``````{python}# Numerical summarynum_cols = df.select_dtypes(include=["number"]).columns.tolist()num_cols``````{python}iflen(num_cols) >0: core_stats = df[num_cols].agg( ["count", "mean", "median", "std", "var", "min", "max"] ).T pct = df[num_cols].quantile([0.01, 0.05, 0.25, 0.50, 0.75, 0.95, 0.99]).T pct.columns = [f"p{int(p*100)}"for p in pct.columns] iqr = (pct["p75"] - pct["p25"]).rename("IQR") numeric_summary = core_stats.join(pct, how="left").join(iqr, how="left")print("Numerical Summary Statistics") display(numeric_summary)else:print("No Numerical columns found.")``````{python}# Categorical summarycat_cols = ["BORO", "JURISDICTION_CODE","PERP_RACE", "VIC_RACE","PERP_SEX", "VIC_SEX","PERP_AGE_GROUP", "VIC_AGE_GROUP"]# cat_cols = [c for c in cat_cols if c in df.columns]# cat_colsfiltered = []for c in cat_cols:if c in df.columns: filtered.append(c)cat_cols = filteredcat_cols``````{python}def value_counts_table(series: pd.Series, dropna=False): vc = series.value_counts(dropna=dropna) pct = (vc / vc.sum() *100).round(2) out = pd.DataFrame({"count": vc, "percent": pct})return outprint("\nCategorical Value Counts")for col in cat_cols:print(f"\n- {col} - ") display(value_counts_table(df[col]))```<!-- ## **Univariate Analysis** -->```{python}import matplotlib.pyplot as plt``````{python}# 1) Shootings Over Years# Extract just the yeardf['YEAR'] = df['OCCUR_DATE'].dt.yearplt.figure(figsize=(10, 6))df['YEAR'].hist(bins=len(df['YEAR'].unique()), edgecolor='black')plt.title("Shootings Over Years")plt.xlabel("Year")plt.ylabel("Number of Incidents")plt.show()``````{python}# 2) Shootings by Hour of Dayif ('HOUR'notin df.columns) or df['HOUR'].isna().all(): oc = df['OCCUR_TIME']# Parse hour from strings like "HH:MM:SS" (works even if they’re dtype object) hours = pd.to_datetime(oc.astype(str), errors='coerce').dt.hour# Handle rare "24:00:00" → hour 0 mask_24 = oc.astype(str).str.fullmatch(r'24:00:00', na=False) hours = hours.mask(mask_24, 0) df['HOUR'] = hoursprint("Non-null HOUR values:", df['HOUR'].notna().sum())h = df['HOUR'].dropna().astype(int)if h.empty:print("No valid hours to plot (all parsed as NaN). Inspect OCCUR_TIME formatting.")else: plt.figure(figsize=(10,6)) h.hist(bins=range(0,25), edgecolor='black', align='left') plt.xticks(range(0,24)) plt.title("Shooting by Hour of Day") plt.xlabel("Hour of Day (0 = Midnight)") plt.ylabel("Number of Incidents") plt.tight_layout() plt.show()``````{python}#| scrolled: true# 3) Shooting by Boroughplt.figure(figsize=(10, 6))df['BORO'].value_counts().plot(kind='bar', edgecolor='black')plt.title('Shooting by Borough')plt.xlabel('Borough')plt.ylabel('Number of Incident')plt.tight_layout()plt.show()``````{python}# 4) Perp/Victim Age Groupfig, axes = plt.subplots(1, 2, figsize=(14, 6), sharey=True)df['PERP_AGE_GROUP'].value_counts().plot(kind='bar', ax=axes[0], edgecolor='black')axes[0].set_title("Shootings by Perp Age Group")axes[0].set_xlabel("Age Group"); axes[0].set_ylabel("Number of Incident")df['VIC_AGE_GROUP'].value_counts().plot(kind='bar', ax=axes[1], edgecolor='black')axes[1].set_title("Shootings by Victim Age Group")axes[1].set_xlabel("Age Group")plt.tight_layout()plt.show()``````{python}# 5) Perp/Victim Sexfig, axes = plt.subplots(1, 2, figsize=(10,5), sharey=True)df['PERP_SEX'].value_counts().plot(kind='bar', ax=axes[0], edgecolor='black')axes[0].set_title("Shooting by Perp Sex"); axes[0].set_xlabel("Sex")df['VIC_SEX'].value_counts().plot(kind='bar', ax=axes[1], edgecolor='black')axes[1].set_title("Shootings by Victim Sex"); axes[1].set_xlabel("Sex")plt.tight_layout();plt.show()``````{python}#| scrolled: true# 6) Perp/Victim Racefig, axes = plt.subplots(1, 2, figsize=(14,6), sharey=True)df['PERP_RACE'].value_counts().plot(kind='bar', ax=axes[0],edgecolor='black')axes[0].set_title("Shooting by Perp Race"); axes[0].set_xlabel("Race")df['VIC_RACE'].value_counts().plot(kind='bar', ax=axes[1], edgecolor='black')axes[1].set_title("Shootings by Victim Race"); axes[1].set_xlabel("Race")plt.tight_layout(); plt.show()```<!-- ## **Bivriate Analysis** -->```{python}# 1) Crosstab: PERP_RACE vs VIC_RACEprint("\nCrosstab: PERP_RACE vs VIC_RACE")race_ct_counts = pd.crosstab(df['PERP_RACE'], df['VIC_RACE'], margins=True)print("\nRaw Counts:")display(race_ct_counts)race_ct_pct = pd.crosstab(df['PERP_RACE'], df['VIC_RACE'], normalize='index') *100print("\nRaw-Normalized Percentage (%):")display(race_ct_pct.round(1))``````{python}# 2) Murder rate (STATISTICAL_MURDER_FLAG) by BOROprint("\Murder Rate by Borough")murder_boro_counts = pd.crosstab(df['BORO'], df['STATISTICAL_MURDER_FLAG'])print("\nRaw Counts:")display(murder_boro_counts)murder_boro_pct = df.groupby('BORO')['STATISTICAL_MURDER_FLAG'].mean() *100print("\nMurder Rate (%):")display(murder_boro_pct.round(2))murder_boro_pct.sort_values(ascending=False).plot( kind='bar', figsize=(8,5), edgecolor='black', title="Murder Rate (%) by Borough")plt.ylabel("Murder Rate (%)")plt.show()``````{python}# 3) Murder rate by PERCINCTprint("\nMurder Rate by Precinct")murder_precinct_counts = pd.crosstab(df['PRECINCT'], df['STATISTICAL_MURDER_FLAG'])print("\nRaw Counts:")display(murder_precinct_counts.head(10))murder_precinct_pct = df.groupby('PRECINCT')['STATISTICAL_MURDER_FLAG'].mean() *100print("\nMurder Rate (%)")display(murder_precinct_pct.round(2).head(10))# Plot top 15murder_precinct_pct.sort_values(ascending=False).head(15).plot( kind='bar', figsize=(12,6), edgecolor='black')plt.title("Top 15 precincts by Murder Rte (%)")plt.ylabel("Murder Rate (%)")plt.show()``````{python}# 4) JURISICTION_CODE vs LOCATION_DESCprint("\nCrosstab: JURISDICTION_CODE vs LOCATION_DESC")# Raw countsjur_loc_counts = pd.crosstab(df['JURISDICTION_CODE'], df['LOCATION_DESC'])print("\nRaw Counts")display(jur_loc_counts.head(15))# Raw normalized %jur_loc_pct = pd.crosstab(df['JURISDICTION_CODE'], df['LOCATION_DESC'], normalize='index') *100print("\nRaw-Normalized Percentages (%):")display(jur_loc_pct.round(2).head(10))```<!-- ## **Multivariate Analysis** -->```{python}# 1) Trends by Year + Borough + Murder flagprint("\nTrends by Year, Borough, and Murder Flag")# Extract year from OCCUR_DATEdf['YEAR'] = df['OCCUR_DATE'].dt.year# Groupt by Year, Borough, and Murder flagtrend = df.groupby(['YEAR', 'BORO', 'STATISTICAL_MURDER_FLAG']).size().reset_index(name='count')# Pivot for easier plotingtrend_pivot = trend.pivot_table(index=['YEAR', 'BORO'], columns='STATISTICAL_MURDER_FLAG', values='count',fill_value=0)trend_pivot.columns = ['Non-Murder', 'Murder']display(trend_pivot.head(10))``````{python}# 2) Age Group + Race + Murder Flagprint("\nAge Group + Race + Murder Flag")# Crosstabage_race_murder = pd.crosstab([df['PERP_AGE_GROUP'], df['PERP_RACE']], df['STATISTICAL_MURDER_FLAG'])age_race_murder['TOTAL'] = age_race_murder.sum(axis=1)display(age_race_murder)``````{python}# 3) Victim vs Perpetrator Demographics Breakdownprint("\nVictim vs Perpetrater Demographics")# Crosstab Racerace_ct = pd.crosstab(df['PERP_RACE'], df['VIC_RACE'], normalize='index') *100print("\nRace Crosstab (%):")display(race_ct.round(1))# Crosstab Sexsex_ct = pd.crosstab(df['PERP_SEX'], df['VIC_SEX'], normalize='index') *100print("\nSex Crosstab (%)")display(sex_ct.round(1))# Crosstab Ageage_ct = pd.crosstab(df['PERP_AGE_GROUP'], df['VIC_AGE_GROUP'], normalize='index') *100print("\nAge Crosstab (%):")display(age_ct.round(1))``````{python}print("\nOut-of-Bounds Coordinates")x_min, x_max =912000, 1067000y_min, y_max =120000, 272000invalid_coords = df[ (df['X_COORD_CD'] < x_min) | (df['X_COORD_CD'] > x_max) | (df['Y_COORD_CD'] < y_min) | (df['Y_COORD_CD'] > y_max)]print(f"Total invalid coords: {len(invalid_coords)}")display(invalid_coords[['INCIDENT_KEY', 'BORO', 'X_COORD_CD', 'Y_COORD_CD']].head(10))``````{python}print("\nChecking for suspicious OCCUR_TIME values (exact midnight)...")# Flag only rows with exact "00:00:00"midnight_exact = df[df['OCCUR_TIME'] =="00:00:00"]print(f"Exact midnight incidents: {len(midnight_exact)}")display(midnight_exact[['INCIDENT_KEY', 'OCCUR_DATE', 'OCCUR_TIME', 'BORO']].head(10))``````{python}print("\nPERP_AGE_GROUP rare values check")display(df['PERP_AGE_GROUP'].value_counts())print("\nVIC_AGE_GROUP rare values check")display(df['VIC_AGE_GROUP'].value_counts())``````{python}print("\nPERP_RACE low-frequency values")display(df['PERP_RACE'].value_counts(normalize=True).tail())print("\nVIC_RACE low-frequency values")display(df['VIC_RACE'].value_counts(normalize=True).tail())print("\nPERP_SEX low-frequency values")display(df['PERP_SEX'].value_counts(normalize=True).tail())print("\nVIC_SEX low-frequency values")display(df['VIC_SEX'].value_counts(normalize=True).tail())``````{python}# # Create temporary HTML file# with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.html') as f:# df.to_html(f.name)# webbrowser.open(f'file://{f.name}')```## IntroductionIn the past five years, gun violence has remained a major concern for New Yorkers. Surveys consistently show that the vast majority of residents consider crime to be a very or somewhat serious issue, with many expressing skepticism about progress in reducing gun violence. At the same time, others note that overall crime rates have been trending downward, with some arguing that New York City feels safer than in previous years.Amid this debate, public agencies continue to prioritize transparency, accountability, and data-driven decision-making. One area where this approach is especially valuable is in crime reporting and prevention.This report analyzes NYPD shooting incident data to uncover trends, patterns, and factors that shape the distribution of gun violence. Specifically, the analysis focuses on data from 2020 through 2024, with the objectives of identifying geographic hotspots, temporal patterns, and demographic trends.The dataset used in this analysis is publicly available through NYC Open Data, with further details provided in the Data section. The report will briefly describe the structure and types of data included, but it will not dive into technical details of data cleaning or coding. Instead, the focus is on presenting clear insights through exploratory data analysis (EDA), visualization, and statistical summaries.The following sections detail the dataset, analytical approach, results, and key recommendations.## About the Data This analysis uses the **NYPD Shooting Incident Data – Historic**, which is publicly available through [NYC Open Data](https://data.cityofnewyork.us/Public-Safety/NYPD-Shooting-Incident-Data-Historic-/833y-fsy8/about_data). The dataset contains detailed records of shooting incidents reported in New York City. For this report, the focus is on the years **2020 through 2024**. At the time of import, the dataset included **29,744 records (rows)** and **21 columns (variables)**. After preparing and cleaning the data (e.g., converting dates into usable formats, handling missing values, and standardizing variable types), the final dataset contained **29,738 records**. The cleaning process was carried out in code and is not covered in this report, since the intended audience is non-technical. ### Data Structure Each row in the dataset represents a single shooting incident. The key information captured includes: - **Incident details** - `INCIDENT_KEY`: Unique identifier for each incident - `OCCUR_DATE` and `OCCUR_TIME`: Date and time of the incident - `BORO` and `PRECINCT`: Location by borough and police precinct - `JURISDICTION_CODE`: Indicates whether the incident fell under NYPD (Patrol, Transit, Housing) or non-NYPD jurisdiction - **Participants** - `PERP_AGE_GROUP`, `PERP_SEX`, `PERP_RACE`: Age group, sex, and race of the perpetrator - `VIC_AGE_GROUP`, `VIC_SEX`, `VIC_RACE`: Age group, sex, and race of the victim - **Incident outcomes** - `STATISTICAL_MURDER_FLAG`: Indicates if the shooting resulted in a death - **Location details** - `LOCATION_DESC`: Description of where the incident occurred - `X_COORD_CD`, `Y_COORD_CD`, `Latitude`, `Longitude`, `Lon_Lat`: Geographic coordinates of the incident ### Notes on Data Use - The dataset includes both categorical information (e.g., borough, sex, race, age group) and numeric/geographic information (e.g., coordinates, dates, times). - While the raw dataset contained some missing or inconsistent values, these were addressed during data preparation. The details of this process are available in the underlying code, but are not presented in this report. - The goal of this section is to provide an overview of what the data contains, rather than the technical steps of cleaning. ```{python}#| echo: false#| include: truefrom tabulate import tabulate#dfdf.head(10).style.set_caption("Sample of NYPD Shooting Incident Data").hide(axis="index")```## **Overall Trends**### **Figure 1:**```{python}# --- Pew-ish rc settings ---def pew_rc(base_size=12, font_family="DejaVu Sans"):return {"font.size": base_size,"font.family": font_family,"text.color": "#222222","figure.facecolor": "white","axes.facecolor": "white","axes.edgecolor": "white","axes.labelcolor": "#333333","axes.titlesize": base_size *1.35,"axes.titleweight": "bold","axes.titlepad": 6,"xtick.color": "#424242","ytick.color": "#222222","axes.grid": False,"legend.frameon": False, }PEW_BLUE ="#4B6E82"# muted slate blue used for barsPEW_GRAY ="#E5E5E5"UP_COLOR ="#4B6E82"# Pew-like slate blueDOWN_COLOR ="#8C8C8C"# muted gray for decreases``````{python}#| echo: false#| include: true# --- Prepare data: total shootings per year 2020–2024 ---df["YEAR"] = df["OCCUR_DATE"].dt.yearcounts = ( df.loc[df["YEAR"].between(2020, 2024), "YEAR"] .value_counts() .reindex([2020, 2021, 2022, 2023, 2024], fill_value=0))with mpl.rc_context(pew_rc(base_size=12)): fig, ax = plt.subplots(figsize=(7.5, 4.8))# Horizontal bars bars = ax.barh([str(y) for y in counts.index], counts.values, color=PEW_BLUE, edgecolor="none", height=0.6)# Thin left baseline (like Pew’s vertical axis line) ax.axvline(0, color="#555555", linewidth=0.8)# Remove spines/ticks for a clean look; keep y labelsfor spine in ["top", "right", "bottom"]: ax.spines[spine].set_visible(False) ax.xaxis.set_visible(False) # hide x ticks/labels ax.margins(x=0.05) # breathing room on the right# Value labels inside barsfor b in bars: val =int(b.get_width()) ax.text(b.get_width() - (0.01*counts.max() +10), # a bit inside the bar b.get_y() + b.get_height()/2,f"{val:,}", va="center", ha="right", color="white", fontsize=10, fontweight="bold")# Title + subtitle (subtitle italic, left-aligned) fig.suptitle("Total Shootings per Year (2020–2024)", x=0.06, ha="left", va="bottom", fontsize=15, fontweight="bold")#ax.set_title("Citywide totals; 2024 may be partial-year", loc="left",#fontsize=12, color="#666666", style="italic", pad=8)# Source / note footer (left aligned under plot) fig.text(0.06, 0.02,"Source: NYPD Shooting Incident Data. Note: Bars show total incident counts per calendar year.", ha="left", va="bottom", fontsize=10, color="#666666") plt.tight_layout(rect=[0, 0.05, 1, 0.92]) plt.show()```This chart shows the total number of shooting incidents reported in New York City from 2020 through 2024. - The highest count occurred in **2021 with 2,011 incidents**. - **2020** followed closely with **1,948 incidents**. - Shootings have steadily declined since then, reaching **1,181 incidents in 2024**, the lowest in the period. **Takeaway:** Shootings surged during the early pandemic years but have steadily declined since 2021, showing measurable progress in reducing gun violence. ### **Figure 2:**```{python}#| echo: false#| include: true# --- Prepare data: totals per year, then YoY % ---df["YEAR"] = df["OCCUR_DATE"].dt.yearcounts = ( df.loc[df["YEAR"].between(2020, 2024), "YEAR"] .value_counts() .reindex([2020, 2021, 2022, 2023, 2024], fill_value=0))yoy = counts.pct_change() *100# 2021–2024 resultyoy = yoy.iloc[1:] # drop 2020 (no prior year)yoy = yoy.replace([np.inf, -np.inf], np.nan) # guard div/0yoy = yoy.fillna(0)with mpl.rc_context(pew_rc(base_size=12)): fig, ax = plt.subplots(figsize=(7.5, 4.8)) years = [str(y) for y in yoy.index] # "2021"..."2024" values = yoy.values colors = [UP_COLOR if v >=0else DOWN_COLOR for v in values] bars = ax.barh(years, values, color=colors, edgecolor="none", height=0.6)# Thin zero baseline ax.axvline(0, color="#555555", linewidth=0.8)# Clean look: hide boxy spines & x-axis labelsfor spine in ["top", "right", "bottom"]: ax.spines[spine].set_visible(False) ax.xaxis.set_visible(False) ax.margins(x=0.10) # whitespace to the left/right# Labels inside (or just outside for small bars), with sign max_abs =max(1, np.nanmax(np.abs(values))) pad =0.02* max_absfor b, v inzip(bars, values): txt =f"{v:+.1f}%"if v >=0: x = v - (pad ifabs(v) > pad*3else-pad) # if tiny, place outside ha ="right"ifabs(v) > pad*3else"left" color ="white"ifabs(v) > pad*3else"#222222"else: x = v + (pad ifabs(v) > pad*3else pad) # if tiny, place outside ha ="left" color ="white"ifabs(v) > pad*3else"#222222" ax.text(x, b.get_y() + b.get_height()/2, txt, va="center", ha=ha, fontsize=10, fontweight="bold", color=color)# Titles + note fig.suptitle("Year-over-Year % Change in Shootings (2021–2024)", x=0.06, ha="left", fontsize=15, fontweight="bold")#ax.set_title("Change from prior year; 2024 may be partial-year",#loc="left", fontsize=12, color="#666666", style="italic", pad=8) fig.text(0.06, 0.02,"Source: NYPD Shooting Incident Data. Note: Values show percent change vs prior year.", ha="left", fontsize=10, color="#666666") plt.tight_layout(rect=[0, 0.05, 1, 0.92]) plt.show()```This chart shows how shootings changed year to year compared with the previous year. - From **2020 to 2021**, shootings increased by **+3.2%**. - Incidents then fell sharply: **–14.7% in 2022** and **–27.2% in 2023**, the steepest decline in the five-year period. - In **2024**, shootings continued to decrease, though at a slower pace (**–5.4%**). **Takeaway:** While shootings rose slightly in 2021, the following years saw consistent and significant declines, though the pace of improvement slowed in 2024. ```{python}# Extract yeardf["YEAR"] = df["OCCUR_DATE"].dt.year# Count shootings per year 2020–2024counts = ( df.loc[df["YEAR"].between(2020, 2024), "YEAR"] .value_counts() .reindex([2020, 2021, 2022, 2023, 2024], fill_value=0))# Year-over-year % changeyoy = counts.pct_change() *100# Build summary tablesummary = pd.DataFrame({"Total Shootings": counts,"YoY % Change": yoy.round(1)})summary```<!-- ### Line chart showing **5-year trend** in shootings and murder flag (STATISTICAL_MURDER_FLAG). -->```{python}# --- Pew-ish rc for line charts (y-grid on, clean look) ---def pew_rc_line(base_size=12, font_family="DejaVu Sans"):return {"font.size": base_size,"font.family": font_family,"text.color": "#222222","figure.facecolor": "white","axes.facecolor": "white","axes.edgecolor": "white","axes.labelcolor": "#333333","axes.titlesize": base_size *1.35,"axes.titleweight": "bold","axes.titlepad": 6,"xtick.color": "#424242","ytick.color": "#424242","xtick.direction": "out","ytick.direction": "out","axes.grid": True, # light y-grid like Pew"grid.color": "#E5E5E5","grid.linewidth": 0.6,"axes.grid.axis": "y","legend.frameon": False, }# Pew colorsPEW_BLUE ="#4B6E82"PEW_ORANGE ="#C64700"``````{python}# Define year rangeyears = [2020, 2021, 2022, 2023, 2024]# Make sure YEAR column existsdf["YEAR"] = df["OCCUR_DATE"].dt.year# Total shootings per yearshootings = (df.loc[df["YEAR"].between(2020, 2024), "YEAR"] .value_counts() .reindex(years, fill_value=0))# Convert STATISTICAL_MURDER_FLAG (True/False → 1/0)murders = (df.loc[df["YEAR"].between(2020, 2024)] .groupby("YEAR")["STATISTICAL_MURDER_FLAG"] .sum() .reindex(years, fill_value=0))```### **Figure 3:**```{python}#| echo: false#| include: truewith mpl.rc_context(pew_rc_line(base_size=12)): fig, ax = plt.subplots(figsize=(7.8, 4.8)) ax.plot(years, shootings.values, marker="o", linewidth=2.5, color=PEW_BLUE, label="Shootings (all incidents)") ax.plot(years, murders.values, marker="o", linewidth=2.5, color=PEW_ORANGE, label="Shootings resulting in death")# Clean axesfor spine in ["top", "right"]: ax.spines[spine].set_visible(False) ax.spines["left"].set_visible(False) ax.spines["bottom"].set_visible(False) ax.set_xticks(years) ax.set_ylabel("Number of incidents")# Label last pointsdef annotate_last(x_vals, y_vals, color): x_last, y_last = x_vals[-1], y_vals[-1] ax.annotate(f"{int(y_last):,}", xy=(x_last, y_last), xytext=(6, 0), textcoords="offset points", va="center", ha="left", fontsize=11, fontweight="bold", color=color) annotate_last(years, shootings.values, PEW_BLUE) annotate_last(years, murders.values, PEW_ORANGE)# Titles fig.suptitle("5-Year Trend in Shootings Incidents (All vs. Resulting in Death), 2020–2024", x=0.06, ha="left", fontsize=15, fontweight="bold")#ax.set_title("Citywide totals; 2024 may be partial-year",#loc="left", fontsize=12, color="#666666", style="italic", pad=8)# Legend moved above chart area ax.legend(loc="lower left", bbox_to_anchor=(0, 1.05), ncol=1, frameon=False)# Source fig.text(0.06, 0.02,"Source: NYPD Shooting Incident Data. Note: ‘Shootings resulting in death’ based on STATISTICAL_MURDER_FLAG.", ha="left", fontsize=10, color="#666666") plt.tight_layout(rect=[0, 0.05, 1, 0.92]) plt.show()```This line chart compares all shooting incidents with those that resulted in a victim’s death. - Total shootings peaked in **2021** at just over **2,000 incidents**, before declining to **1,181 in 2024**. - Fatal shootings followed a similar pattern: a small rise in 2021, then steady decreases each year, reaching **239 deaths in 2024**. **Takeaway:** Both overall shootings and fatal shootings have declined since 2021, showing not only fewer incidents but also reduced deadliness over time. ## **Geographic Analysis**### **Figure 1:**```{python}#| echo: false#| include: true# --- Pew-ish rc (line with light y-grid) ---def pew_rc_line(base_size=12, font_family="DejaVu Sans"):return {"font.size": base_size, "font.family": font_family, "text.color": "#222222","figure.facecolor": "white", "axes.facecolor": "white", "axes.edgecolor": "white","axes.labelcolor": "#333333", "axes.titlesize": base_size *1.35,"axes.titleweight": "bold", "axes.titlepad": 6,"xtick.color": "#424242", "ytick.color": "#424242","xtick.direction": "out", "ytick.direction": "out","axes.grid": True, "grid.color": "#E5E5E5", "grid.linewidth": 0.6, "axes.grid.axis": "y","legend.frameon": False, }# 5 distinct, muted colors for boroughsBORO_COLORS = {"Bronx": "#176D9C","Brooklyn": "#C64700","Manhattan": "#6A994E","Queens": "#7A5195","Staten Island": "#BC5090",}years = [2020, 2021, 2022, 2023, 2024]df["YEAR"] = df["OCCUR_DATE"].dt.yearmask = df["YEAR"].between(2020, 2024)# Pivot counts: rows=year, cols=borotrend = (df.loc[mask] .groupby(["YEAR", "BORO"]) .size() .unstack("BORO") .reindex(index=years, columns=sorted(df["BORO"].dropna().unique())) .fillna(0))with mpl.rc_context(pew_rc_line(base_size=12)): fig, ax = plt.subplots(figsize=(8.2, 5.0))for boro in trend.columns: yvals = trend[boro].values.astype(int) color = BORO_COLORS.get(boro, "#888888") ax.plot(years, yvals, marker="o", linewidth=2.5, label=boro, color=color)# Label last point for each borough ax.annotate(f"{yvals[-1]:,}", xy=(years[-1], yvals[-1]), xytext=(6, 0), textcoords="offset points", va="center", ha="left", fontsize=10, fontweight="bold", color=color)# Clean spinesfor s in ["top", "right"]: ax.spines[s].set_visible(False) ax.spines["left"].set_visible(False) ax.spines["bottom"].set_visible(False) ax.set_xticks(years) ax.set_ylabel("Number of incidents")# Legend above chart ax.legend(loc="lower left", bbox_to_anchor=(0, 1.05), ncol=3)# Titles & source fig.suptitle("Shootings by Borough: 5‑Year Trend (2020–2024)", x=0.06, ha="left", fontsize=15, fontweight="bold")#ax.set_title("Citywide totals by borough; 2024 may be partial‑year",#loc="left", fontsize=12, color="#666666", style="italic", pad=8) fig.text(0.06, 0.02, "Source: NYPD Shooting Incident Data.", ha="left", fontsize=10, color="#666666") plt.tight_layout(rect=[0, 0.05, 1, 0.92]) plt.show()```This chart breaks down total shootings by borough from 2020 through 2024. - The **Bronx** showed a small increase in 2024, rising to **458 incidents**, while **Manhattan** also ticked upward to **215 incidents**. - In contrast, **Brooklyn (338 incidents)** and **Queens (155 incidents)** followed steady downward trends. - **Staten Island** remained the lowest and relatively stable, with a small decline in 2024. **Takeaway:** Most boroughs are seeing long-term declines, but localized increases in the Bronx and Manhattan suggest areas where targeted interventions are most needed. ```{python}# --- Pew-ish rc for horizontal bar charts ---def pew_rc_bar(base_size=12, font_family="DejaVu Sans"):return {"font.size": base_size,"font.family": font_family,"text.color": "#222222","figure.facecolor": "white","axes.facecolor": "white","axes.edgecolor": "white","axes.labelcolor": "#333333","axes.titlesize": base_size *1.35,"axes.titleweight": "bold","axes.titlepad": 6,"xtick.color": "#424242","ytick.color": "#222222","axes.grid": False,"legend.frameon": False }``````{python}# --- Count by precinct ---top10 = (df["PRECINCT"] .value_counts() .head(10) .sort_values(ascending=True)) # sort ascending for barhwith mpl.rc_context(pew_rc_bar(base_size=12)): # re-use your Pew bar theme fig, ax = plt.subplots(figsize=(7.5, 5)) bars = ax.barh(top10.index.astype(str), top10.values, color=PEW_BLUE, height=0.6) ax.axvline(0, color="#555555", linewidth=0.8)for s in ["top", "right", "bottom"]: ax.spines[s].set_visible(False) ax.xaxis.set_visible(False)# Value labels inside bars maxv =int(top10.max()) pad =0.01* maxv +8for b in bars: val =int(b.get_width()) ax.text(b.get_width() - pad, b.get_y() + b.get_height()/2,f"{val:,}", va="center", ha="right", fontsize=11, fontweight="bold", color="white") fig.suptitle("Top 10 Precincts by Shooting Incidents", x=0.06, ha="left", fontsize=15, fontweight="bold") ax.set_title("Total incidents, 2020–2024", loc="left", fontsize=12, color="#666666", style="italic", pad=8) fig.text(0.06, 0.02, "Source: NYPD Shooting Incident Data.", ha="left", fontsize=10, color="#666666") plt.tight_layout(rect=[0, 0.05, 1, 0.92]) plt.show()``````{python}#pip install folium```### **Figure 2:**```{python}#| echo: false#| include: true# --- Ensure datetime & year ---df["OCCUR_DATE"] = pd.to_datetime(df["OCCUR_DATE"], errors="coerce")df["YEAR"] = df["OCCUR_DATE"].dt.year# --- Filter to 2024 ---df_2024 = df[df["YEAR"] ==2024].copy()# --- Citywide total (denominator) ---total_2024 =len(df_2024)# --- Group by precinct + borough, sort, take top 10 ---top10_2024 = ( df_2024.groupby(["PRECINCT", "BORO"], dropna=False) .size() .reset_index(name="Incident Count") .sort_values("Incident Count", ascending=False) .head(10) .reset_index(drop=True))# --- Percent of citywide & cumulative ---top10_2024["% of Citywide Shootings"] = (top10_2024["Incident Count"] / total_2024 *100).round(1)top10_2024["Cumulative % of Shootings"] = top10_2024["% of Citywide Shootings"].cumsum().round(1)# --- Rename columns for non-technical readers ---top10_2024 = top10_2024.rename(columns={"PRECINCT": "Police Precinct","BORO": "Borough","Incident Count": "Number of Incidents",})# --- Reorder columns using the NEW names ---top10_2024 = top10_2024[ ["Police Precinct", "Borough", "Number of Incidents","% of Citywide Shootings", "Cumulative % of Shootings"]]# --- (Optional) Format percentages as strings for readability ---top10_2024["% of Citywide Shootings"] = top10_2024["% of Citywide Shootings"].map("{:.1f}%".format)top10_2024["Cumulative % of Shootings"] = top10_2024["Cumulative % of Shootings"].map("{:.1f}%".format)top10_2024.style.set_caption("Top 10 Precincts by Shooting Incidents — 2024").hide(axis="index")```This table lists the 10 police precincts with the highest number of shooting incidents in 2024. - The **44th Precinct (Bronx)** recorded the most, with **83 shootings (7.0% of citywide total)**. - The **46th Precinct (Bronx)** ranked second, with **72 incidents (6.1%)**. - Together, just these two precincts accounted for **13.1% of all shootings in 2024**. - Across the list, **7 of the 10 precincts are in the Bronx**, underscoring its concentration of incidents. - Collectively, the top 10 precincts made up nearly **45% of citywide shootings** in 2024. **Takeaway:** Gun violence is highly concentrated, with the Bronx dominating the top precincts and nearly half of all 2024 incidents coming from only 10 areas. ```{python}#pip install jupyterlab_widgets```### **Figure 3:**```{python}#| echo: false#| include: true# ----- prep: year + GeoDataFrame -----df = df.copy()df["OCCUR_DATE"] = pd.to_datetime(df["OCCUR_DATE"], errors="coerce")df["YEAR"] = df["OCCUR_DATE"].dt.yeardf_yr = df[df["YEAR"].between(2020, 2024)].dropna(subset=["X_COORD_CD", "Y_COORD_CD"]).copy()gdf = gpd.GeoDataFrame( df_yr, geometry=gpd.points_from_xy(df_yr["X_COORD_CD"], df_yr["Y_COORD_CD"]), crs="EPSG:2263"# NYC State Plane).to_crs(epsg=4326)gdf["lat"] = gdf.geometry.ygdf["lon"] = gdf.geometry.x# ----- map -----m = folium.Map(location=[40.7128, -74.0060], zoom_start=10, tiles="CartoDB positron")palette = ["#4B6E82", "#C64700", "#6A994E", "#7A5195", "#BC5090","#888888", "#2F4B7C", "#FFA600", "#58508D", "#003F5C"]def detail_html(row): dt = row["OCCUR_DATE"].date() if pd.notna(row["OCCUR_DATE"]) else""return (f"<b>Precinct:</b> {row.get('PRECINCT','')}<br>"f"<b>Borough:</b> {row.get('BORO','')}<br>"f"<b>Date:</b> {dt}<br>"f"<b>Resulted in death:</b> {bool(row.get('STATISTICAL_MURDER_FLAG', False))}" )for year inrange(2020, 2025): gdf_y = gdf[gdf["YEAR"] == year]# recompute Top 10 *for that year* top10 = gdf_y["PRECINCT"].value_counts().head(10).index.tolist() gdf_y_top = gdf_y[gdf_y["PRECINCT"].isin(top10)].copy()if gdf_y_top.empty:continue# color per precinct within this year precincts =list(dict.fromkeys(gdf_y_top["PRECINCT"])) # preserve order colors = {p: palette[i %len(palette)] for i, p inenumerate(precincts)} layer = folium.FeatureGroup(name=f"{year} • Top 10 precincts", show=(year ==2024))# Cluster settings: dissolve sooner so hover tooltips are reachable earlier cluster = MarkerCluster( name=f"{year} incidents", options={"disableClusteringAtZoom": 13, # try 11–15 depending on preference"showCoverageOnHover": False,"spiderfyOnMaxZoom": True } ).add_to(layer)# light sampling if huge gdf_plot = gdf_y_top.sample(6000, random_state=42) iflen(gdf_y_top) >6000else gdf_y_topfor _, r in gdf_plot.iterrows():if pd.isna(r["lat"]) or pd.isna(r["lon"]):continue tooltip = folium.Tooltip(detail_html(r), sticky=True) # <-- HOVER popup = folium.Popup(detail_html(r), max_width=260) # <-- CLICK (optional) folium.CircleMarker( location=(r["lat"], r["lon"]), radius=4, # slightly larger to make hover easier color=None, fill=True, fill_opacity=0.6, fill_color=colors.get(r["PRECINCT"], "#999999"), tooltip=tooltip, # show details on hover popup=popup # show details on click ).add_to(cluster) layer.add_to(m)folium.LayerControl(collapsed=False).add_to(m)# Optional legend shell (kept generic since precincts change per layer)legend_html ="""<div style="position: fixed; bottom: 20px; left: 20px; z-index: 9999;background: white; padding: 10px 12px; border: 1px solid #ddd;box-shadow: 0 1px 4px rgba(0,0,0,0.1); font-size: 12px;"><div style="font-weight:600;margin-bottom:6px;">Top 10 Precincts (toggle a year)</div><div style="color:#666;">Hover for details; click for full popup.</div><div style="color:#666;">Source: NYPD Shooting Incident Data</div></div>"""m.get_root().html.add_child(folium.Element(legend_html))# Fit to all pointsifnot gdf.empty: m.fit_bounds([[gdf["lat"].min(), gdf["lon"].min()], [gdf["lat"].max(), gdf["lon"].max()]])m # displays inline in Jupyter```#### About this Map (How to Read & Use It)This interactive map shows **locations of NYPD shooting incidents** for each year from **2020 to 2024**. For each year, the map highlights the **Top 10 police precincts** with the most incidents that year.**What the colors mean**- Each **color** represents a **different precinct** (within the year you’ve selected).- Points show where incidents occurred within those precincts.**How to use the map**1. **Switch years:** Use the **layers box** in the top-right to toggle a year on or off (e.g., “2024 • Top 10 precincts”). Only one or multiple years can be shown at once.2. **Zoom & pan:** Use the **+ / − buttons** or your mouse/touchpad to zoom in and drag to move around the city.3. **See details:** - **Hover** over a point to see a quick summary (precinct, borough, date, and whether it resulted in death). - **Click** a point to open a small popup with the same information.4. **Clusters:** When zoomed out, nearby points are grouped together for readability. - **Zoom in** to separate them and see individual incidents. - At close zoom levels, points will “uncluster” automatically.**What you can learn at a glance**- **Where** shootings concentrate within the Top 10 precincts in a given year. - **How** patterns shift across years by toggling different year layers. - **Context** for potential hot spots (e.g., clusters of points within specific neighborhoods/precincts).**Notes**- This map focuses on **the Top 10 precincts per year** (not all precincts) so you can quickly see where incidents are most concentrated.- Data comes from **NYPD Shooting Incident Data – Historic (NYC Open Data)**.```{python}# --- Params ---YEAR =2024TITLE =f"Shooting Incident Hotspots — {YEAR}"SUBTITLE ="Kernel density heatmap (citywide); darker = higher density"SOURCE ="Source: NYPD Shooting Incident Data"# --- Prep data (ensure YEAR exists) ---df = df.copy()df["OCCUR_DATE"] = pd.to_datetime(df["OCCUR_DATE"], errors="coerce")df["YEAR"] = df["OCCUR_DATE"].dt.yeardf_y = df[(df["YEAR"] == YEAR)].dropna(subset=["X_COORD_CD", "Y_COORD_CD"]).copy()gdf_y = gpd.GeoDataFrame( df_y, geometry=gpd.points_from_xy(df_y["X_COORD_CD"], df_y["Y_COORD_CD"]), crs="EPSG:2263"# NYC State Plane).to_crs(epsg=4326)# HeatMap expects [lat, lon, (optional) weight]heat_data =list(zip(gdf_y.geometry.y, gdf_y.geometry.x))# --- Map (Pew-ish: minimal, neutral) ---m = folium.Map(location=[40.7128, -74.0060], zoom_start=11, tiles="CartoDB positron")# Darker Pew-style gradient (less white, more saturated blues)pew_gradient = {0.0: "#dbe4eb", # light steel blue (instead of near-white)0.2: "#b0c4d8",0.4: "#7d9bbd",0.6: "#527a9d",0.8: "#355f82",1.0: "#1d3d5a"# deep navy blue}HeatMap( heat_data, radius=14, # hotspot blob size blur=18, # soft blending max_zoom=12, min_opacity=0.35, gradient=pew_gradient).add_to(m)# --- Title / subtitle block (top-left) ---title_html =f"""<div style="position: fixed; top: 14px; left: 14px; z-index: 9999;background: rgba(255,255,255,0.95); padding: 10px 12px; border: 1px solid #e6e6e6;box-shadow: 0 1px 4px rgba(0,0,0,0.06); font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif;"> <div style="font-weight:700; font-size:14px; color:#222;">{TITLE}</div> <div style="font-size:12px; color:#666; font-style:italic; margin-top:2px;">{SUBTITLE}</div></div>"""m.get_root().html.add_child(folium.Element(title_html))# --- Legend strip (bottom-left) ---legend_html ="""<div style="position: fixed; bottom: 18px; left: 14px; z-index: 9999;background: rgba(255,255,255,0.95); padding: 10px 12px; border: 1px solid #e6e6e6;box-shadow: 0 1px 4px rgba(0,0,0,0.06); font-size:12px; color:#444;"> <div style="margin-bottom:6px; font-weight:600;">Density</div> <div style="display:flex; align-items:center;"> <div style="width:160px; height:10px; background: linear-gradient(to right, #f3f6f8, #d9e3ea, #b9cddc, #8aaec3, #5f8ea8, #4b6e82); border-radius: 2px; border: 1px solid #ddd;"></div> <div style="margin-left:8px; color:#666;">low → high</div> </div> <div style="margin-top:6px; color:#666;">{src}</div></div>""".format(src=SOURCE)m.get_root().html.add_child(folium.Element(legend_html))# --- Fit view to data (if any) ---iflen(heat_data) >0: m.fit_bounds([ [gdf_y.geometry.y.min(), gdf_y.geometry.x.min()], [gdf_y.geometry.y.max(), gdf_y.geometry.x.max()] ])m```## **Temporal Analysis**### **Figure 1:**```{python}#| echo: false#| include: true# --- Prep ---df = df.copy()df["OCCUR_DATE"] = pd.to_datetime(df["OCCUR_DATE"], errors="coerce")df["YEAR"] = df["OCCUR_DATE"].dt.yeardf["MONTH"] = df["OCCUR_DATE"].dt.monthyears = [2020, 2021, 2022, 2023, 2024]month_order =list(range(1,13))month_labels = ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]# Year x Month counts (rows=year, cols=month)counts = ( df[df["YEAR"].isin(years)] .groupby(["YEAR","MONTH"]) .size() .unstack("MONTH") .reindex(index=years, columns=month_order) .fillna(0) .astype(int))# --- Pew-ish rc (light y-grid) ---def pew_rc_line(base_size=12, font_family="DejaVu Sans"):return {"font.size": base_size, "font.family": font_family, "text.color": "#222222","figure.facecolor": "white", "axes.facecolor": "white", "axes.edgecolor": "white","axes.labelcolor": "#333333", "axes.titlesize": base_size *1.25,"axes.titleweight": "bold", "axes.titlepad": 6,"xtick.color": "#424242", "ytick.color": "#424242","xtick.direction": "out", "ytick.direction": "out","axes.grid": True, "grid.color": "#E5E5E5", "grid.linewidth": 0.6, "axes.grid.axis": "y","legend.frameon": False, }PEW_BLUE ="#4B6E82"# Consistent Y-limit across panelsymax = counts.to_numpy().max()ymax = ymax +max(10, 0.08*ymax) # small headroomwith mpl.rc_context(pew_rc_line(base_size=12)): fig, axes = plt.subplots(nrows=len(years), ncols=1, figsize=(9, 9), sharex=True, sharey=True)iflen(years) ==1: axes = [axes]for ax, yr inzip(axes, years): yvals = counts.loc[yr, month_order].values ax.plot(month_order, yvals, marker="o", linewidth=2.5, color=PEW_BLUE) ax.set_ylim(0, ymax) ax.set_yticks(ax.get_yticks()) # keep default, clean grid ax.set_ylabel("Incidents") ax.set_title(str(yr), loc="left", fontsize=13, fontweight="bold", pad=2)# Summer shading (Jun–Aug) ax.axvspan(6-0.5, 8+0.5, color=PEW_BLUE, alpha=0.08)# Minimal spinesfor s in ["top","right","left","bottom"]: ax.spines[s].set_visible(False)# Bottom axis: month labels once axes[-1].set_xticks(month_order, month_labels)# Overall labels fig.suptitle("Shooting Incidents by Month — Separate Charts per Year (2020–2024)", x=0.06, ha="left", fontsize=15, fontweight="bold") axes[0].set_title(f"{years[0]}", loc="left", fontsize=13, fontweight="bold", pad=2) # already set per panel; kept for clarity fig.text(0.06, 0.02, "Summer months shaded; consistent y-axis across panels. Source: NYPD Shooting Incident Data", ha="left", fontsize=10, color="#666666") plt.tight_layout(rect=[0, 0.05, 1, 0.92]) plt.show()```This chart shows monthly shooting incidents for each year between 2020 and 2024. - **2020–2022**: Shootings peaked in the summer months (June–August), often exceeding **200 incidents**. - **2023–2024**: Summer spikes became smaller, staying below **200 incidents**, and monthly trends appeared more stable. - In **2024**, incidents stayed fairly steady throughout the year, with only a dip in August and a rebound in September. **Takeaway:** Shootings were strongly seasonal in earlier years, but in recent years the summer spikes have weakened, suggesting an overall stabilization of monthly patterns. ### **Figure 2:**```{python}#| echo: false#| include: true# --- Prep ---df = df.copy()df["OCCUR_DATE"] = pd.to_datetime(df["OCCUR_DATE"], errors="coerce")# Parse OCCUR_TIME safely; keep just the hour (0–23)# Handles strings like "HH:MM:SS" or already-parsed timest = pd.to_datetime(df["OCCUR_TIME"], format="%H:%M:%S", errors="coerce")df["HOUR"] = t.dt.hourdf["YEAR"] = df["OCCUR_DATE"].dt.yearmask = df["YEAR"].between(2020, 2024)df_yr = df[mask].dropna(subset=["HOUR"]).copy()years =sorted(df_yr["YEAR"].unique())hours =list(range(24))# Build an hourly count table per year (rows=year, cols=0..23)hourly = ( df_yr.groupby(["YEAR", "HOUR"]) .size() .unstack("HOUR") .reindex(index=years, columns=hours, fill_value=0))# --- Pew-ish rc ---def pew_rc_bar(base_size=12, font_family="DejaVu Sans"):return {"font.size": base_size, "font.family": font_family, "text.color": "#222222","figure.facecolor": "white", "axes.facecolor": "white", "axes.edgecolor": "white","axes.labelcolor": "#333333","axes.titlesize": base_size *1.35, "axes.titleweight": "bold", "axes.titlepad": 6,"xtick.color": "#222222", "ytick.color": "#222222","axes.grid": True, "grid.color": "#E5E5E5", "grid.linewidth": 0.6, "axes.grid.axis": "y","legend.frameon": False }PEW_BLUE ="#4B6E82"# Consistent y-axis across years for fair comparisonymax =int(hourly.values.max())ymax = ymax +max(2, int(0.08* ymax)) # a bit of headroomwith mpl.rc_context(pew_rc_bar(base_size=12)): fig, axes = plt.subplots(len(years), 1, figsize=(9, 2.4*len(years)), sharex=True, sharey=True)iflen(years) ==1: axes = [axes]for ax, yr inzip(axes, years): counts = hourly.loc[yr, hours].values bars = ax.bar(hours, counts, color=PEW_BLUE, width=0.8)# Axes & labels ax.set_ylim(0, ymax) ax.set_xticks(hours) ax.set_xticklabels([f"{h:02d}:00"for h in hours], rotation=45, ha="right") ax.set_ylabel("Incidents") ax.set_title(str(yr), loc="left", fontsize=12, fontweight="bold")# Value labels above bars (only where > 0)if counts.max() >0:for b, v inzip(bars, counts):if v >0: ax.text(b.get_x() + b.get_width()/2, v +max(1, 0.01*ymax),f"{int(v):,}", ha="center", va="bottom", fontsize=9, color="#333")# Minimal spinesfor s in ["top","right","left","bottom"]: ax.spines[s].set_visible(False) fig.suptitle("Shooting Incidents by Hour of Day — Separate Charts per Year (2020–2024)", x=0.06, ha="left", fontsize=15, fontweight="bold") fig.text(0.06, 0.02, "Source: NYPD Shooting Incident Data", ha="left", fontsize=10, color="#666666") plt.tight_layout(rect=[0, 0.05, 1, 0.92]) plt.show()```This chart shows how shootings are distributed by time of day for each year between 2020 and 2024. - The pattern across all years forms a **U-shape**, with peaks during **late-night and evening hours**. - Early mornings consistently recorded the fewest incidents. - In **2020–2022**, late-night and evening peaks often exceeded **100 incidents**. - In **2023–2024**, these peaks dropped below **100**, except for a late-night spike in 2023. **Takeaway:** Shootings remain concentrated at night, but overall frequency at peak hours has decreased since 2023. ## **Demographic Analysis**### **Figure 1:**```{python}#| echo: false#| include: true# --- Prep ---df = df.copy()df["OCCUR_DATE"] = pd.to_datetime(df["OCCUR_DATE"], errors="coerce")df["YEAR"] = df["OCCUR_DATE"].dt.yeardf_yr = df[df["YEAR"].between(2020, 2024)].copy()years =sorted(df_yr["YEAR"].unique())# Age group universe (kept consistent across years)perp_counts_all = df_yr["PERP_AGE_GROUP"].value_counts()vic_counts_all = df_yr["VIC_AGE_GROUP"].value_counts()age_groups =list(dict.fromkeys(list(perp_counts_all.index) +list(vic_counts_all.index)))# --- Pew-ish rc ---def pew_rc_bar(base_size=12, font_family="DejaVu Sans"):return {"font.size": base_size, "font.family": font_family, "text.color": "#222222","figure.facecolor": "white", "axes.facecolor": "white", "axes.edgecolor": "white","axes.labelcolor": "#333333", "axes.titlesize": base_size *1.35,"axes.titleweight": "bold", "axes.titlepad": 6,"xtick.color": "#222222", "ytick.color": "#222222","axes.grid": True, "grid.color": "#E5E5E5", "grid.linewidth": 0.6, "axes.grid.axis": "x","legend.frameon": False }PEW_BLUE ="#4B6E82"PEW_ORANGE ="#C64700"LABEL_OUTSIDE_THRESHOLD =80with mpl.rc_context(pew_rc_bar(base_size=12)): fig, axes = plt.subplots(len(years), 2, figsize=(10, 2.8*len(years)), sharey=True)iflen(years) ==1: axes = [axes]for i, yr inenumerate(years):# Counts for this year, aligned to the same age_groups order perp_y = df_yr[df_yr["YEAR"]==yr]["PERP_AGE_GROUP"].value_counts().reindex(age_groups, fill_value=0) vic_y = df_yr[df_yr["YEAR"]==yr]["VIC_AGE_GROUP"].value_counts().reindex(age_groups, fill_value=0)# Perpetrators ax_p = axes[i][0] bars_p = ax_p.barh(age_groups, perp_y.values, color=PEW_BLUE) ax_p.set_title(f"Perpetrators — {yr}", loc="left", fontsize=12, fontweight="bold") ax_p.invert_yaxis() ax_p.set_xlabel("Incidents") xmax_p =max(1, int(perp_y.max())) ax_p.set_xlim(0, xmax_p *1.1)for b, v inzip(bars_p, perp_y.values):if v >0: y = b.get_y() + b.get_height()/2if v < LABEL_OUTSIDE_THRESHOLD: ax_p.text(v +3, y, f"{int(v):,}", va="center", ha="left", color="#222222", fontsize=9, fontweight="bold")else: ax_p.text(v -3, y, f"{int(v):,}", va="center", ha="right", color="white", fontsize=9, fontweight="bold")for s in ["top","right"]: ax_p.spines[s].set_visible(False)# Victims ax_v = axes[i][1] bars_v = ax_v.barh(age_groups, vic_y.values, color=PEW_ORANGE) ax_v.set_title(f"Victims — {yr}", loc="left", fontsize=12, fontweight="bold") ax_v.invert_yaxis() ax_v.set_xlabel("Incidents") xmax_v =max(1, int(vic_y.max())) ax_v.set_xlim(0, xmax_v *1.1)for b, v inzip(bars_v, vic_y.values):if v >0: y = b.get_y() + b.get_height()/2if v < LABEL_OUTSIDE_THRESHOLD: ax_v.text(v +3, y, f"{int(v):,}", va="center", ha="left", color="#222222", fontsize=9, fontweight="bold")else: ax_v.text(v -3, y, f"{int(v):,}", va="center", ha="right", color="white", fontsize=9, fontweight="bold")for s in ["top","right"]: ax_v.spines[s].set_visible(False) fig.suptitle("Perpetrator vs Victim Age Groups in Shooting Incidents — Counts by Year (2020–2024)", x=0.06, ha="left", fontsize=15, fontweight="bold") fig.text(0.06, 0.02, "Labels outside in black if <100 incidents. Source: NYPD Shooting Incident Data", ha="left", fontsize=10, color="#666666") plt.tight_layout(rect=[0, 0.05, 1, 0.92]) plt.show()```This chart compares the age groups of perpetrators and victims in shootings from 2020 through 2024. - **Perpetrators**: The largest category is consistently **“Unknown”**, followed by **ages 25–44**. - **Victims**: Most are aged **25–44**, followed by **18–24**. - Victims recorded as “Unknown” are extremely rare, never more than a handful of cases in any year. **Takeaway:** Young to middle-aged adults (18–44) are the most impacted as victims, while a large portion of perpetrator ages remain unrecorded. ```{python}# --- Prep ---df = df.copy()df["OCCUR_DATE"] = pd.to_datetime(df["OCCUR_DATE"], errors="coerce")df["YEAR"] = df["OCCUR_DATE"].dt.yeardf_yr = df[df["YEAR"].between(2020, 2024)].copy()# --- Age group counts ---perp = df_yr.groupby(["YEAR","PERP_AGE_GROUP"]).size().rename("Perp").reset_index()vic = df_yr.groupby(["YEAR","VIC_AGE_GROUP"]).size().rename("Victim").reset_index()# Union of age groups (so both align)age_groups =sorted(set(perp["PERP_AGE_GROUP"].dropna().unique()) |set(vic["VIC_AGE_GROUP"].dropna().unique()))# --- Pew-ish rc ---def pew_rc_bar(base_size=12, font_family="DejaVu Sans"):return {"font.size": base_size, "font.family": font_family, "text.color": "#222222","figure.facecolor": "white", "axes.facecolor": "white", "axes.edgecolor": "white","axes.labelcolor": "#333333", "axes.titlesize": base_size *1.35,"axes.titleweight": "bold", "axes.titlepad": 6,"xtick.color": "#222222", "ytick.color": "#222222","axes.grid": True, "grid.color": "#E5E5E5", "grid.linewidth": 0.6, "axes.grid.axis": "x","legend.frameon": False }PEW_BLUE ="#4B6E82"PEW_ORANGE ="#C64700"# # --- 1) Combined chart ---# perp_all = df_yr["PERP_AGE_GROUP"].value_counts().reindex(age_groups, fill_value=0)# vic_all = df_yr["VIC_AGE_GROUP"].value_counts().reindex(age_groups, fill_value=0)# with mpl.rc_context(pew_rc_bar()):# fig, axes = plt.subplots(1, 2, figsize=(10,5), sharey=True)# # Perpetrators# bars_p = axes[0].barh(age_groups, perp_all.values, color=PEW_BLUE)# axes[0].set_title("Perpetrators (2020–2024 combined)", loc="left", fontsize=13, fontweight="bold")# axes[0].invert_yaxis()# axes[0].set_xlabel("Incidents")# # Victims# bars_v = axes[1].barh(age_groups, vic_all.values, color=PEW_ORANGE)# axes[1].set_title("Victims (2020–2024 combined)", loc="left", fontsize=13, fontweight="bold")# axes[1].invert_yaxis()# axes[1].set_xlabel("Incidents")# # --- Add labels with inside/outside logic ---# for ax, bars, vals in [(axes[0], bars_p, perp_all.values),# (axes[1], bars_v, vic_all.values)]:# maxv = max(vals) if len(vals) else 1# for b, v in zip(bars, vals):# if v == 0:# continue# if v < 0.1 * maxv: # small bar → put label outside, black# ax.text(v + (0.01*maxv + 5), b.get_y() + b.get_height()/2,# f"{int(v):,}", va="center", ha="left",# color="#222222", fontsize=9, fontweight="bold")# else: # normal bar → put label inside, white# ax.text(v - (0.01*maxv + 5), b.get_y() + b.get_height()/2,# f"{int(v):,}", va="center", ha="right",# color="white", fontsize=9, fontweight="bold")# fig.suptitle("Perpetrator vs Victim Age Groups — Combined 2020–2024",# x=0.06, ha="left", fontsize=15, fontweight="bold")# fig.text(0.06, 0.02, "Source: NYPD Shooting Incident Data", # ha="left", fontsize=10, color="#666666")# plt.tight_layout(rect=[0,0.05,1,0.92])# plt.show()# --- 2) Small-multiples (percent per year) ---years =sorted(df_yr["YEAR"].unique())with mpl.rc_context(pew_rc_bar()): fig, axes = plt.subplots(len(years), 2, figsize=(11, 2.8*len(years)), sharey=True)iflen(years) ==1: axes = [axes]for i, yr inenumerate(years): perp_y = (df_yr[df_yr["YEAR"]==yr]["PERP_AGE_GROUP"] .value_counts(normalize=True) *100).round(1) vic_y = (df_yr[df_yr["YEAR"]==yr]["VIC_AGE_GROUP"] .value_counts(normalize=True) *100).round(1) perp_y = perp_y.reindex(age_groups, fill_value=0) vic_y = vic_y.reindex(age_groups, fill_value=0)# Perp ax_p = axes[i][0] bars_p = ax_p.barh(age_groups, perp_y.values, color=PEW_BLUE) ax_p.set_title(f"Perpetrators — {yr}", loc="left", fontsize=12, fontweight="bold") ax_p.invert_yaxis() ax_p.set_xlabel("% of incidents")# Victim ax_v = axes[i][1] bars_v = ax_v.barh(age_groups, vic_y.values, color=PEW_ORANGE) ax_v.set_title(f"Victims — {yr}", loc="left", fontsize=12, fontweight="bold") ax_v.invert_yaxis() ax_v.set_xlabel("% of incidents")# Clean spinesfor ax in (ax_p, ax_v):for s in ["top","right"]: ax.spines[s].set_visible(False)# --- Labels with inside/outside logic (threshold = 10%) ---# Perp ax_p.set_xlim(0, 105) # allow room for outside labelsfor b, v inzip(bars_p, perp_y.values):if v <=0: continue y = b.get_y() + b.get_height()/2if v <12: ax_p.text(v +1.0, y, f"{v:.1f}%", va="center", ha="left", color="#222222", fontsize=9, fontweight="bold")else: ax_p.text(v -1.0, y, f"{v:.1f}%", va="center", ha="right", color="white", fontsize=9, fontweight="bold")# Victims ax_v.set_xlim(0, 105)for b, v inzip(bars_v, vic_y.values):if v <=0: continue y = b.get_y() + b.get_height()/2if v <10: ax_v.text(v +1.0, y, f"{v:.1f}%", va="center", ha="left", color="#222222", fontsize=9, fontweight="bold")else: ax_v.text(v -1.0, y, f"{v:.1f}%", va="center", ha="right", color="white", fontsize=9, fontweight="bold") fig.suptitle("Perpetrator vs Victim Age Groups — % by Year", x=0.06, ha="left", fontsize=15, fontweight="bold") fig.text(0.06, 0.02,"Each bar shows share of total incidents that year. Source: NYPD Shooting Incident Data", ha="left", fontsize=10, color="#666666") plt.tight_layout(rect=[0, 0.05, 1, 0.92]) plt.show()```### **Figure 2:**```{python}#| echo: false#| include: true# --- Prep ---df = df.copy()df["OCCUR_DATE"] = pd.to_datetime(df["OCCUR_DATE"], errors="coerce")df["YEAR"] = df["OCCUR_DATE"].dt.yeardf_yr = df[df["YEAR"].between(2020, 2024)].copy()years =sorted(df_yr["YEAR"].unique())# Pew-ish rcdef pew_rc_bar(base_size=12, font_family="DejaVu Sans"):return {"font.size": base_size, "font.family": font_family, "text.color": "#222222","figure.facecolor": "white", "axes.facecolor": "white", "axes.edgecolor": "white","axes.labelcolor": "#333333", "axes.titlesize": base_size *1.35,"axes.titleweight": "bold", "axes.titlepad": 6,"xtick.color": "#222222", "ytick.color": "#222222","axes.grid": True, "grid.color": "#E5E5E5", "grid.linewidth": 0.6, "axes.grid.axis": "x","legend.frameon": False }PEW_BLUE ="#4B6E82"PEW_ORANGE ="#C64700"# Union of sex categories across yearssex_groups =sorted(set(df_yr["PERP_SEX"].dropna().unique()) |set(df_yr["VIC_SEX"].dropna().unique()))LABEL_OUTSIDE_THRESHOLD =12.0# %with mpl.rc_context(pew_rc_bar()): fig, axes = plt.subplots(len(years), 2, figsize=(10, 2.8*len(years)), sharey=True)iflen(years) ==1: axes = [axes]for i, yr inenumerate(years): perp_sex = (df_yr[df_yr["YEAR"]==yr]["PERP_SEX"].value_counts(normalize=True)*100).round(1) vic_sex = (df_yr[df_yr["YEAR"]==yr]["VIC_SEX"].value_counts(normalize=True)*100).round(1) perp_sex = perp_sex.reindex(sex_groups, fill_value=0) vic_sex = vic_sex.reindex(sex_groups, fill_value=0)# --- Perpetrators --- ax_p = axes[i][0] bars_p = ax_p.barh(sex_groups, perp_sex.values, color=PEW_BLUE) ax_p.set_title(f"Perpetrators — {yr}", loc="left", fontsize=12, fontweight="bold") ax_p.invert_yaxis() ax_p.set_xlabel("% of incidents") ax_p.set_xlim(0, 105) # room for outside labels# Labels: outside & black if <10%, else inside & whitefor b, v inzip(bars_p, perp_sex.values):if v <=0:continue y = b.get_y() + b.get_height()/2if v < LABEL_OUTSIDE_THRESHOLD: ax_p.text(v +1.0, y, f"{v:.1f}%", va="center", ha="left", color="#222222", fontsize=9, fontweight="bold")else: ax_p.text(v -1.0, y, f"{v:.1f}%", va="center", ha="right", color="white", fontsize=9, fontweight="bold")for s in ["top","right"]: ax_p.spines[s].set_visible(False)# --- Victims --- ax_v = axes[i][1] bars_v = ax_v.barh(sex_groups, vic_sex.values, color=PEW_ORANGE) ax_v.set_title(f"Victims — {yr}", loc="left", fontsize=12, fontweight="bold") ax_v.invert_yaxis() ax_v.set_xlabel("% of incidents") ax_v.set_xlim(0, 105)# Same label logic for victimsfor b, v inzip(bars_v, vic_sex.values):if v <=0:continue y = b.get_y() + b.get_height()/2if v < LABEL_OUTSIDE_THRESHOLD: ax_v.text(v +1.0, y, f"{v:.1f}%", va="center", ha="left", color="#222222", fontsize=9, fontweight="bold")else: ax_v.text(v -1.0, y, f"{v:.1f}%", va="center", ha="right", color="white", fontsize=9, fontweight="bold")for s in ["top","right"]: ax_v.spines[s].set_visible(False) fig.suptitle("Perpetrator Sex vs Victim Sex in Shooting Incidents — % by Year (2020–2024)", x=0.06, ha="left", fontsize=15, fontweight="bold") fig.text(0.06, 0.02, "Source: NYPD Shooting Incident Data", ha="left", fontsize=10, color="#666666") plt.tight_layout(rect=[0,0.05,1,0.92]) plt.show()```This chart compares the sex of perpetrators and victims in shootings between 2020 and 2024. - **Perpetrators**: In 2020–2021, most were recorded as **Unknown**, but by 2022–2024, **males accounted for nearly 60%**, overtaking the Unknown category. Females consistently stayed under **2%**. - **Victims**: The vast majority are **male (over 80%)**, with females representing **10–12%** across all years. Unknown victims are negligible, appearing only once in 2023. **Takeaway:** Victims are overwhelmingly male, while perpetrator records improved after 2021, with more identified as male rather than left as unknown. ### **Figure 3:**```{python}#| echo: false#| include: true# --- Prep ---df = df.copy()df["OCCUR_DATE"] = pd.to_datetime(df["OCCUR_DATE"], errors="coerce")df["YEAR"] = df["OCCUR_DATE"].dt.yeardf_yr = df[df["YEAR"].between(2020, 2024)].copy()years =sorted(df_yr["YEAR"].unique())# Pew-ish rcdef pew_rc_bar(base_size=12, font_family="DejaVu Sans"):return {"font.size": base_size, "font.family": font_family, "text.color": "#222222","figure.facecolor": "white", "axes.facecolor": "white", "axes.edgecolor": "white","axes.labelcolor": "#333333", "axes.titlesize": base_size *1.35,"axes.titleweight": "bold", "axes.titlepad": 6,"xtick.color": "#222222", "ytick.color": "#222222","axes.grid": True, "grid.color": "#E5E5E5", "grid.linewidth": 0.6, "axes.grid.axis": "x","legend.frameon": False }PEW_BLUE ="#4B6E82"PEW_ORANGE ="#C64700"# Union of race categories across yearsrace_groups =sorted(set(df_yr["PERP_RACE"].dropna().unique()) |set(df_yr["VIC_RACE"].dropna().unique()))# --- Plot small-multiples ---with mpl.rc_context(pew_rc_bar()): fig, axes = plt.subplots(len(years), 2, figsize=(11, 2.8*len(years)), sharey=True)iflen(years) ==1: axes = [axes]for i, yr inenumerate(years): perp_race = (df_yr[df_yr["YEAR"]==yr]["PERP_RACE"].value_counts(normalize=True)*100).round(1) vic_race = (df_yr[df_yr["YEAR"]==yr]["VIC_RACE"].value_counts(normalize=True)*100).round(1) perp_race = perp_race.reindex(race_groups, fill_value=0) vic_race = vic_race.reindex(race_groups, fill_value=0)# --- Perpetrators --- ax_p = axes[i][0] ax_p.barh(race_groups, perp_race.values, color=PEW_BLUE) ax_p.set_title(f"Perpetrators — {yr}", loc="left", fontsize=12, fontweight="bold") ax_p.invert_yaxis() ax_p.set_xlabel("% of incidents")# Ensure some margin to fit outside labels max_val =float(perp_race.max()) ax_p.set_xlim(0, max(100, max_val +12))for y, v inenumerate(perp_race.values):if v <=0:continueif v <12: ax_p.text(v +1, y, f"{v:.1f}%", va="center", ha="left", color="#222222", fontsize=9, fontweight="bold")else: ax_p.text(v -1, y, f"{v:.1f}%", va="center", ha="right", color="white", fontsize=9, fontweight="bold")for s in ["top","right"]: ax_p.spines[s].set_visible(False)# --- Victims --- ax_v = axes[i][1] ax_v.barh(race_groups, vic_race.values, color=PEW_ORANGE) ax_v.set_title(f"Victims — {yr}", loc="left", fontsize=12, fontweight="bold") ax_v.invert_yaxis() ax_v.set_xlabel("% of incidents")# Ensure margin for outside labels max_val_v =float(vic_race.max()) ax_v.set_xlim(0, max(100, max_val_v +12))for y, v inenumerate(vic_race.values):if v <=0:continueif v <10: ax_v.text(v +1, y, f"{v:.1f}%", va="center", ha="left", color="#222222", fontsize=9, fontweight="bold")else: ax_v.text(v -1, y, f"{v:.1f}%", va="center", ha="right", color="white", fontsize=9, fontweight="bold")for s in ["top","right"]: ax_v.spines[s].set_visible(False) fig.suptitle("Perpetrator vs Victim Race/Ethnicity in Shooting Incidents — % by Year (2020–2024)", x=0.06, ha="left", fontsize=15, fontweight="bold") fig.text(0.06, 0.02, "Source: NYPD Shooting Incident Data", ha="left", fontsize=10, color="#666666") plt.tight_layout(rect=[0,0.05,1,0.92]) plt.show()```This chart compares perpetrators and victims in shootings by race/ethnicity from 2020 to 2024. - **Perpetrators**: The largest categories shift between **Unknown** and **Black**, depending on the year. In 2022, Black perpetrators overtook Unknown. White Hispanic and Black Hispanic follow, while White and Asian/Pacific Islander are small minorities. - **Victims**: **Black individuals dominate**, consistently making up **65–72%** of victims. White Hispanic victims are the second-largest group, followed by Black Hispanic and White victims. Asian/Pacific Islander and American Indian/Alaskan Native victims appear in very small proportions. **Takeaway:** Victim demographics are consistent, with Black individuals disproportionately affected, while perpetrator data is less stable due to many cases recorded as “Unknown.” ```{python}# --- Prep ---df = df.copy()df["OCCUR_DATE"] = pd.to_datetime(df["OCCUR_DATE"], errors="coerce")df["YEAR"] = df["OCCUR_DATE"].dt.yeardf_yr = df[df["YEAR"].between(2020, 2024)].copy()# Crosstab: countscrosstab_counts = pd.crosstab(df_yr["PERP_RACE"], df_yr["VIC_RACE"], dropna=False)# Crosstab: % of all incidentscrosstab_percent = (crosstab_counts / crosstab_counts.values.sum() *100).round(1)# Crosstab: % of victim group (columns sum to 100)crosstab_by_victim = (crosstab_counts.div(crosstab_counts.sum(axis=0), axis=1) *100).round(1)# Crosstab: % of perpetrator group (rows sum to 100)crosstab_by_perp = (crosstab_counts.div(crosstab_counts.sum(axis=1), axis=0) *100).round(1)# --- Display ---print("\n=== Crosstab: Counts ===")display(crosstab_counts)print("\n=== Crosstab: % of all incidents ===")display(crosstab_percent)print("\n=== Crosstab: % of victim group (columns = 100%) ===")display(crosstab_by_victim)print("\n=== Crosstab: % of perpetrator group (rows = 100%) ===")display(crosstab_by_perp)```## **Murder Flag Analysis:**### **Figure 1:**```{python}#| echo: false#| include: true# --- Prep ---df = df.copy()df["OCCUR_DATE"] = pd.to_datetime(df["OCCUR_DATE"], errors="coerce")df["YEAR"] = df["OCCUR_DATE"].dt.yeardf_yr = df[df["YEAR"].between(2020, 2024)].copy()years =sorted(df_yr["YEAR"].unique())# Pew-ish rcdef pew_rc_bar(base_size=12, font_family="DejaVu Sans"):return {"font.size": base_size, "font.family": font_family, "text.color": "#222222","figure.facecolor": "white", "axes.facecolor": "white", "axes.edgecolor": "white","axes.labelcolor": "#333333", "axes.titlesize": base_size *1.3,"axes.titleweight": "bold", "axes.titlepad": 6,"xtick.color": "#222222", "ytick.color": "#222222","axes.grid": False, "legend.frameon": False }PEW_BLUE ="#4B6E82"# False (not murder)PEW_ORANGE ="#C64700"# True (murder)# --- Plot ---with mpl.rc_context(pew_rc_bar()): fig, axes = plt.subplots(len(years), 1, figsize=(8, 2.8*len(years)), sharex=True)iflen(years) ==1: axes = [axes]for ax, yr inzip(axes, years): sub = df_yr[df_yr["YEAR"] == yr]["STATISTICAL_MURDER_FLAG"].value_counts(normalize=True) *100 sub = sub.reindex([False, True], fill_value=0) ax.barh([""], [sub[False]], color=PEW_BLUE, label="Not Murder") ax.barh([""], [sub[True]], left=[sub[False]], color=PEW_ORANGE, label="Murder")# Labelsif sub[False] >0: ax.text(sub[False]/2, 0, f"{sub[False]:.1f}%", ha="center", va="center", color="white", fontsize=10, fontweight="bold")if sub[True] >0: ax.text(sub[False] + sub[True]/2, 0, f"{sub[True]:.1f}%", ha="center", va="center", color="white", fontsize=10, fontweight="bold") ax.set_title(f"{yr}", loc="left", fontsize=12, fontweight="bold") ax.set_yticks([]) ax.set_xlim(0, 100) ax.set_xlabel("% of shootings")for s in ["top","right","left","bottom"]: ax.spines[s].set_visible(False)# Suptitle fig.suptitle("% of Shooting Incidents Resulting in Victim’s Death (2020–2024)", x=0.06, ha="left", fontsize=15, fontweight="bold")# Legend ABOVE plots, left-aligned handles, labels = axes[0].get_legend_handles_labels() fig.legend(handles, labels, loc="upper left", bbox_to_anchor=(0.06, 0.96), ncol=2, frameon=False) fig.text(0.06, 0.02, "Each bar shows % of shootings flagged as murder in that year. Source: NYPD Shooting Incident Data", ha="left", fontsize=10, color="#666666") plt.tight_layout(rect=[0,0.05,1,0.9]) plt.show()```This chart shows the percentage of shootings that resulted in a victim’s death between 2020 and 2024. - Fatal shootings made up roughly **19–21%** of all incidents each year. - The highest level occurred in **2021 (21.3%)**, while the lowest was in **2020 (18.8%)**. **Takeaway:** The percentage of shootings that end in death has remained relatively stable, even as overall incident counts have declined. ```{python}# --- Prep ---df = df.copy()df["OCCUR_DATE"] = pd.to_datetime(df["OCCUR_DATE"], errors="coerce")df["YEAR"] = df["OCCUR_DATE"].dt.yeardf_yr = df[df["YEAR"].between(2020, 2024)].copy()years =sorted(df_yr["YEAR"].unique())results = {}for yr in years: sub = df_yr[df_yr["YEAR"] == yr]# Borough breakdown boro_pct = ( pd.crosstab(sub["BORO"], sub["STATISTICAL_MURDER_FLAG"], normalize="index") *100 ).round(1)# Victim Age breakdown age_pct = ( pd.crosstab(sub["VIC_AGE_GROUP"], sub["STATISTICAL_MURDER_FLAG"], normalize="index") *100 ).round(1)# Victim Race breakdown race_pct = ( pd.crosstab(sub["VIC_RACE"], sub["STATISTICAL_MURDER_FLAG"], normalize="index") *100 ).round(1) results[yr] = {"Borough %": boro_pct,"Victim Age %": age_pct,"Victim Race %": race_pct }# --- Display ---for yr, tables in results.items():print(f"\n=== {yr} ===")for name, tbl in tables.items():print(f"\n{name}") display(tbl)```## **Conclusion**This analysis of NYPD shooting incident data from 2020 to 2024 shows that gun violence in New York City has **declined steadily since its peak in 2021**. While the city recorded over 2,000 shootings in 2021, incidents dropped to just over 1,100 by 2024, reflecting meaningful progress. The data also highlights that gun violence is **not evenly distributed**. Shootings are concentrated in a small number of precincts, particularly in the **Bronx and parts of Manhattan**, while other boroughs such as Brooklyn and Queens have shown consistent declines. Temporal patterns indicate that shootings remain most common during **nighttime and summer months**, though these peaks have weakened in recent years. Demographic trends reveal that **young to middle-aged Black males are disproportionately impacted as victims**, while gaps in perpetrator data (notably large “Unknown” categories) limit deeper analysis. The proportion of shootings that result in death has remained relatively stable at around **19–21%**, suggesting that while the number of incidents has fallen, their lethality has not significantly changed. ---## **Recommendations**- **Target geographic hotspots:** Focus prevention and intervention strategies in the Bronx and Manhattan precincts that consistently rank highest for shooting incidents. - **Address peak times:** Maintain heightened prevention efforts during **nighttime hours** and **summer months**, when shootings are most frequent. - **Improve data quality:** Strengthen NYPD data collection on perpetrators, particularly age and sex, to reduce the reliance on “Unknown” categories and improve future analysis. - **Sustain monitoring:** Continue tracking trends annually to ensure recent declines are maintained and to quickly identify any emerging patterns. - **Community-focused strategies:** Pair enforcement with community programs addressing root causes of violence in the hardest-hit neighborhoods. ---